Intro

Why hello, everyone. Welcome to my Skills Exam 3. This was a fun one. We were able to use just about everything we learned in class, and that was pretty dope. Let’s dive in.

Load and Clean Faculty Data

First off, we added our packages and our “FacultySalaries_1995.csv” data to our R script. We then cleaned it up a little. Check it out:

# Load Packages ####
library(tidyverse)
library(janitor)
library(modelr)
library(easystats)
library(broom)
# Load and Clean Faculty Salaries####
faculty <- read_csv("./FacultySalaries_1995.csv")
glimpse(faculty)
## Rows: 1,161
## Columns: 17
## $ FedID               <dbl> 1061, 1063, 1065, 11462, 1002, 1004, 1008, 1009, 1~
## $ UnivName            <chr> "Alaska Pacific University", "Univ.Alaska-Fairbank~
## $ State               <chr> "AK", "AK", "AK", "AK", "AL", "AL", "AL", "AL", "A~
## $ Tier                <chr> "IIB", "I", "IIA", "IIA", "IIA", "IIA", "IIB", "I"~
## $ AvgFullProfSalary   <dbl> 454, 686, 533, 612, 442, 441, 466, 580, 498, 506, ~
## $ AvgAssocProfSalary  <dbl> 382, 560, 494, 507, 369, 385, 394, 437, 379, 412, ~
## $ AvgAssistProfSalary <dbl> 362, 432, 329, 414, 310, 310, 351, 374, 322, 359, ~
## $ AvgProfSalaryAll    <dbl> 382, 508, 415, 498, 350, 388, 396, 455, 401, 411, ~
## $ AvgFullProfComp     <dbl> 567, 914, 716, 825, 530, 542, 558, 692, 655, 607, ~
## $ AvgAssocProfComp    <dbl> 485, 753, 663, 681, 444, 473, 476, 527, 501, 508, ~
## $ AvgAssistProfComp   <dbl> 471, 572, 442, 557, 376, 383, 427, 451, 404, 445, ~
## $ AvgProfCompAll      <dbl> 487, 677, 559, 670, 423, 477, 478, 546, 523, 503, ~
## $ NumFullProfs        <dbl> 6, 74, 9, 115, 59, 57, 20, 366, 34, 67, 8, 106, 27~
## $ NumAssocProfs       <dbl> 11, 125, 26, 124, 77, 33, 18, 354, 25, 40, 15, 42,~
## $ NumAssistProfs      <dbl> 9, 118, 20, 101, 102, 35, 30, 301, 27, 66, 19, 66,~
## $ NumInstructors      <dbl> 4, 40, 9, 21, 24, 2, 0, 66, 3, 27, 2, 58, 4, 19, 3~
## $ NumFacultyAll       <dbl> 32, 404, 70, 392, 262, 127, 68, 1109, 89, 200, 44,~
faculty <- janitor::clean_names(faculty)

faculty <- faculty %>% 
  select(-ends_with("_comp")) %>% 
  select(-starts_with("num_")) %>% 
  select(-ends_with("_comp_all")) %>% 
  select(-ends_with("_salary_all")) %>% 
  filter(tier != "VIIB")

faculty <- faculty %>% 
  rename(
    full = avg_full_prof_salary,
    assist = avg_assist_prof_salary,
    assoc = avg_assoc_prof_salary
  )

faculty <- faculty %>% 
  pivot_longer(cols = c(full,assoc,assist),
               names_to = "rank",
               values_to = "salary")
glimpse(faculty)
## Rows: 3,480
## Columns: 6
## $ fed_id    <dbl> 1061, 1061, 1061, 1063, 1063, 1063, 1065, 1065, 1065, 11462,~
## $ univ_name <chr> "Alaska Pacific University", "Alaska Pacific University", "A~
## $ state     <chr> "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", ~
## $ tier      <chr> "IIB", "IIB", "IIB", "I", "I", "I", "IIA", "IIA", "IIA", "II~
## $ rank      <chr> "full", "assoc", "assist", "full", "assoc", "assist", "full"~
## $ salary    <dbl> 454, 382, 362, 686, 560, 432, 533, 494, 329, 612, 507, 414, ~


Sweet, looks like it is all loaded and ready to go. In the assignment, we were told to recreate a box-and-whisker plot, which was a party. Here is the code for that:

# Create Box-and-Whisker####
faculty %>% 
  ggplot(aes(x=rank,y=salary,fill=rank))+
  geom_boxplot()+
  facet_grid(~tier)+
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 60),
        legend.text = element_text(size=12),
        axis.text = element_text(size=12),
        strip.text.x = element_text(size=14))


Now, that chart is beautiful, and if you want to print it off and put it on your wall, but you can do that yourself. You have the R code lol.

Build ANOVA


Our next goal was to create an ANOVA from this data. We applied this ANOVA according to State, Tier, and Rank on Salary. Here it is:

# ANOVA test according to State Tier and Rank on Salary ####
anova <- aov(salary ~ state + tier + rank, data = faculty)
summary(anova)
##               Df   Sum Sq Mean Sq F value Pr(>F)    
## state         50  5667124  113342   31.17 <2e-16 ***
## tier           2  7072361 3536180  972.56 <2e-16 ***
## rank           2 16526626 8263313 2272.66 <2e-16 ***
## Residuals   3297 11987752    3636                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 128 observations deleted due to missingness


What does this mean? You decide. Analyzing the data wasn’t part of my assignment. ;)



Load and Clean Juniper Data


We took a pivot for the next part of the assignment, working with a new data set. We loaded that data set and cleaned it up a bit. Many thanks to Zahn for providing the names of the oils so I didn’t have to type them all out. Here is the cleaning of the data:


# Load juniper_oils.csv and clean ####
oil <- read_csv("./Juniper_Oils.csv")
glimpse(oil)
## Rows: 34
## Columns: 41
## $ SampleID               <chr> "LOG-16S-SL12", "LOG-16S-SL11", "LOG-16S-CC5", ~
## $ Project                <chr> "JuniperLogs", "JuniperLogs", "JuniperLogs", "J~
## $ Amplicon               <chr> "16S", "16S", "16S", "16S", "16S", "16S", "16S"~
## $ Tree_Species           <chr> "Juniperus osteosperma", "Juniperus osteosperma~
## $ BurnYear               <dbl> 2018, 2017, 2016, 2014, 2013, 2012, 2011, 2009,~
## $ Latitude               <dbl> 41.5719, 40.9202, 37.5799, 39.9422, 41.5335, 40~
## $ Longitude              <dbl> -113.7488, -113.0017, -113.0851, -112.7181, -11~
## $ Field_Office           <chr> "Salt_Lake_3", "Salt_Lake_2", "Cedar_City", "Sa~
## $ BLM_Fire_Name          <chr> "Ridge", "Knob", "Hicks Creek", "Lion Peak", "P~
## $ Tracking_Number        <chr> "#5276 (2018)", "#5275 (2017)", "#5274 (2016)",~
## $ `alpha-pinene`         <dbl> 0.6, 1.4, 0.2, 0.3, 0.1, 0.0, 0.1, 0.2, 0.1, 0.~
## $ `para-cymene`          <dbl> 0.4, 0.3, 1.1, 0.1, 0.1, 0.1, 0.5, 0.6, 0.2, 0.~
## $ `alpha-terpineol`      <dbl> 2.3, 0.5, 0.8, 0.7, 0.2, 0.1, 2.8, 1.1, 0.1, 0.~
## $ `cedr-9-ene`           <dbl> 0.2, 1.0, 1.1, 0.7, 1.0, 0.3, 0.4, 0.5, 0.9, 0.~
## $ `alpha-cedrene`        <dbl> 1.9, 9.5, 13.2, 6.7, 8.1, 9.4, 8.7, 8.6, 35.5, ~
## $ `beta-cedrene`         <dbl> 1.2, 3.3, 3.2, 2.3, 2.9, 1.8, 2.8, 2.9, 7.6, 2.~
## $ `cis-thujopsene`       <dbl> 25.90, 22.80, 12.50, 19.98, 14.00, 9.30, 28.90,~
## $ `alpha-himachalene`    <dbl> 0.2, 0.4, 0.4, 0.3, 0.4, 0.2, 0.3, 0.3, 0.4, 0.~
## $ `beta-chamigrene`      <dbl> 0.4, 1.1, 1.8, 0.8, 0.9, 1.0, 0.5, 0.5, 1.4, 0.~
## $ cuparene               <dbl> 1.8, 1.8, 2.4, 1.7, 1.3, 2.1, 1.5, 1.7, 1.9, 2.~
## $ `compound 1`           <dbl> 0.7, 2.4, 4.8, 2.4, 2.4, 1.9, 1.0, 0.8, 2.7, 1.~
## $ `alpha-chamigrene`     <dbl> 0.0, 0.6, 1.3, 0.0, 0.7, 0.8, 0.3, 0.3, 1.0, 0.~
## $ widdrol                <dbl> 1.0, 5.2, 11.3, 7.2, 13.4, 9.6, 1.5, 2.9, 4.3, ~
## $ cedrol                 <dbl> 37.5, 14.2, 7.5, 21.2, 17.9, 14.6, 39.9, 37.3, ~
## $ `beta-acorenol`        <dbl> 3.3, 0.0, 0.0, 1.0, 1.3, 0.0, 1.5, 0.0, 0.0, 1.~
## $ `alpha-acorenol`       <dbl> 1.1, 0.0, 0.0, 1.0, 0.0, 0.0, 0.7, 0.0, 0.0, 1.~
## $ `gamma-eudesmol`       <dbl> 0.0, 3.5, 2.8, 0.0, 3.0, 1.4, 0.0, 3.2, 0.6, 0.~
## $ `beta-eudesmol`        <dbl> 2.0, 3.5, 2.0, 0.7, 3.3, 1.1, 1.2, 4.8, 0.0, 1.~
## $ `alpha-eudesmol`       <dbl> 0.6, 1.8, 1.3, 0.3, 1.4, 0.6, 0.3, 1.6, 0.0, 0.~
## $ `cedr-8-en-13-ol`      <dbl> 1.6, 10.5, 9.3, 8.2, 3.5, 17.2, 0.3, 6.4, 2.3, ~
## $ `cedr-8-en-15-ol`      <dbl> 2.4, 2.4, 1.7, 2.8, 1.4, 3.4, 0.8, 1.6, 0.8, 2.~
## $ `compound 2`           <dbl> 0.0, 0.5, 1.4, 0.7, 1.6, 2.3, 0.4, 0.9, 0.6, 1.~
## $ thujopsenal            <dbl> 2.6, 2.8, 1.8, 1.8, 2.0, 2.4, 1.5, 1.8, 0.7, 3.~
## $ Yield_percent          <dbl> 0.18, 0.29, 0.34, 0.41, 0.07, 0.54, 0.95, 0.43,~
## $ Bolt_Surface_Area_cm2  <dbl> 1687, 1236, 1866, 2794, 1726, 1313, 1580, 1295,~
## $ Raw_Exit_Holes_per_cm2 <dbl> 0.0000, 0.0081, 0.0000, 0.0054, 0.0046, 0.0008,~
## $ Raw_Exit_Holes         <dbl> 0, 10, 0, 15, 8, 1, 0, 7, 1, 10, 35, 8, NA, 11,~
## $ Living_Larvae          <dbl> 0, 0, 0, 4, 1, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, 0~
## $ ChemTotal              <dbl> 87.70, 89.50, 81.90, 80.88, 80.90, 79.60, 95.90~
## $ ChemMean               <dbl> 3.813043, 3.891304, 3.560870, 3.516522, 3.51739~
## $ YearsSinceBurn         <dbl> 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 15, 16, 18, 19~
oil <- oil %>% 
  select(c("alpha-pinene","para-cymene","alpha-terpineol","cedr-9-ene","alpha-cedrene","beta-cedrene",
           "cis-thujopsene","alpha-himachalene","beta-chamigrene","cuparene","compound 1","alpha-chamigrene",
           "widdrol","cedrol","beta-acorenol","alpha-acorenol","gamma-eudesmol","beta-eudesmol","alpha-eudesmol",
           "cedr-8-en-13-ol","cedr-8-en-15-ol","compound 2","thujopsenal","YearsSinceBurn")) %>% 
  janitor::clean_names()

oil <- oil %>% 
  pivot_longer(cols = !years_since_burn,
               names_to = "chemical_id",
               values_to = "concentration")
glimpse(oil)
## Rows: 782
## Columns: 3
## $ years_since_burn <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,~
## $ chemical_id      <chr> "alpha_pinene", "para_cymene", "alpha_terpineol", "ce~
## $ concentration    <dbl> 0.6, 0.4, 2.3, 0.2, 1.9, 1.2, 25.9, 0.2, 0.4, 1.8, 0.~


It’s all loaded. Let’s play with it now.

Graph the Data


Our next steps in the assignment was to recreate a set of graphs, with “Years_Since_Burn” on the x axis and “Concentration” on the y axis. We faceted these according to “chemicalID” with the y axis scales free. Here are those charts:

# Graph with a facet ~chemical_id ####
oil %>% 
  ggplot(aes(x=years_since_burn,y=concentration))+
  geom_smooth()+
  facet_wrap(~chemical_id, scales = "free_y")+
  theme_minimal()


Again, beautiful charts. Beautiful charts.

Generalized Linear Model


Awesome. Our next step, we created a GLM to view which chemical concentrations are significantly affected by the amount of years since the burn. Here is that info:

# GLM to find significant chemicals ####
x <- oil %>% 
  glm(formula = concentration ~ years_since_burn + chemical_id,
      family = "gaussian")
summary(x)
## 
## Call:
## glm(formula = concentration ~ years_since_burn + chemical_id, 
##     family = "gaussian", data = .)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -15.5371   -0.7153   -0.2041    0.4623   24.6006  
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   0.219505   0.684296   0.321  0.74847    
## years_since_burn              0.004701   0.020848   0.225  0.82166    
## chemical_idalpha_cedrene     10.623529   0.899377  11.812  < 2e-16 ***
## chemical_idalpha_chamigrene   0.358824   0.899377   0.399  0.69003    
## chemical_idalpha_eudesmol     0.594118   0.899377   0.661  0.50908    
## chemical_idalpha_himachalene  0.070588   0.899377   0.078  0.93746    
## chemical_idalpha_pinene       0.023529   0.899377   0.026  0.97914    
## chemical_idalpha_terpineol    0.952941   0.899377   1.060  0.28968    
## chemical_idbeta_acorenol      0.305882   0.899377   0.340  0.73387    
## chemical_idbeta_cedrene       2.776471   0.899377   3.087  0.00209 ** 
## chemical_idbeta_chamigrene    0.705882   0.899377   0.785  0.43278    
## chemical_idbeta_eudesmol      1.788235   0.899377   1.988  0.04714 *  
## chemical_idcedr_8_en_13_ol    5.552941   0.899377   6.174 1.08e-09 ***
## chemical_idcedr_8_en_15_ol    1.452941   0.899377   1.615  0.10662    
## chemical_idcedr_9_ene         0.441176   0.899377   0.491  0.62390    
## chemical_idcedrol            19.823529   0.899377  22.041  < 2e-16 ***
## chemical_idcis_thujopsene    21.298824   0.899377  23.682  < 2e-16 ***
## chemical_idcompound_1         1.817647   0.899377   2.021  0.04363 *  
## chemical_idcompound_2         0.817647   0.899377   0.909  0.36357    
## chemical_idcuparene           1.923529   0.899377   2.139  0.03278 *  
## chemical_idgamma_eudesmol     1.264706   0.899377   1.406  0.16007    
## chemical_idpara_cymene        0.352941   0.899377   0.392  0.69485    
## chemical_idthujopsenal        1.788235   0.899377   1.988  0.04714 *  
## chemical_idwiddrol            6.135294   0.899377   6.822 1.84e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 13.75094)
## 
##     Null deviance: 36652  on 781  degrees of freedom
## Residual deviance: 10423  on 758  degrees of freedom
## AIC: 4294.5
## 
## Number of Fisher Scoring iterations: 2
x_adj <- broom::tidy(x)

x_adj <- x_adj %>% 
  filter(p.value<=.05)
print(x_adj)
## # A tibble: 10 x 5
##    term                       estimate std.error statistic  p.value
##    <chr>                         <dbl>     <dbl>     <dbl>    <dbl>
##  1 chemical_idalpha_cedrene      10.6      0.899     11.8  1.13e-29
##  2 chemical_idbeta_cedrene        2.78     0.899      3.09 2.09e- 3
##  3 chemical_idbeta_eudesmol       1.79     0.899      1.99 4.71e- 2
##  4 chemical_idcedr_8_en_13_ol     5.55     0.899      6.17 1.08e- 9
##  5 chemical_idcedrol             19.8      0.899     22.0  1.40e-83
##  6 chemical_idcis_thujopsene     21.3      0.899     23.7  3.09e-93
##  7 chemical_idcompound_1          1.82     0.899      2.02 4.36e- 2
##  8 chemical_idcuparene            1.92     0.899      2.14 3.28e- 2
##  9 chemical_idthujopsenal         1.79     0.899      1.99 4.71e- 2
## 10 chemical_idwiddrol             6.14     0.899      6.82 1.84e-11


Well, that’s all folks. Thanks for checking it out.